control variate
On Variance Reduction in Learning Mean Flows
One-step generative modeling has emerged as a leading approach to amortize the inference cost of diffusion and flow-matching models. Among distillation-free methods, MeanFlow training is notoriously unstable, with non-decreasing loss and unbounded gradient variance. In this work, we establish a theory that attributes this pathology to a misuse of the conditional velocity field: it plays two distinct statistical roles in the loss, both as an unbiased regression target and as a Monte Carlo control variate inside a Jacobi-vector product, with the original loss assigning the wrong coefficient to the latter. We derive the optimal coefficient in closed form, and show that a family of fixes in concurrent works corresponds to different practical realizations of the same optimum. A controlled sweep of this coefficient on two-dimensional benchmarks and on a latent Diffusion Transformer recovers the predicted bias-variance ordering. The optimal coefficient yields up to a %54 improvement in sample quality on two-dimensional benchmarks and a monotone FID trend at every matched-step DiT checkpoint. Crucially, the same DiT measurement also reveals a quantitative FID-MSE landscape mismatch: although gradient variance is minimized at an interior coefficient value, the coefficient that minimizes FID prefers the direct use of conditional velocity.
GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies
Phylogenetic inference, grounded in molecular evolution models, is essential for understanding the evolutionary relationships in biological data. Accounting for the uncertainty of phylogenetic tree variables, which include tree topologies and evolutionary distances on branches, is crucial for accurately inferring species relationships from molecular data and tasks requiring variable marginalization. Variational Bayesian methods are key to developing scalable, practical models; however, it remains challenging to conduct phylogenetic inference without restricting the combinatorially vast number of possible tree topologies. In this work, we introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces. Through practical considerations on design spaces and control variates for gradient estimations, our approach, GeoPhy, enables variational inference without limiting the topological candidates. In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.
Stochastic Multi-Armed Bandits with Control Variates
This paper studies a new variant of the stochastic multi-armed bandits problem where auxiliary information about the arm rewards is available in the form of control variates. In many applications like queuing and wireless networks, the arm rewards are functions of some exogenous variables. The mean values of these variables are known a priori from historical data and can be used as control variates. Leveraging the theory of control variates, we obtain mean estimates with smaller variance and tighter confidence bounds. We develop an upper confidence bound based algorithm named UCB-CV and characterize the regret bounds in terms of the correlation between rewards and control variates when they follow a multivariate normal distribution. We also extend UCB-CV to other distributions using resampling methods like Jackknifing and Splitting. Experiments on synthetic problem instances validate performance guarantees of the proposed algorithms.
Baselines
As shown in the main text, under the assumption that the influence network is unbiased, our factor baselines are indeed valid control variates. We prove this result below, repeating the statement itself for posterity and providing a supplementary lemma on control variates as a restatement of known results. Let X, Y and Zbe random variables where the law of Xconditional on Z is denoted Pθ(X|Z), and Y is independent of X conditioned on Z; i.e. Then, we have that E[Y θln Pθ(X)] = 0. Proof. Factor baselines are valid control variates if GΣ is true to the MDP (i.e.
Control Variates for Slate Off-Policy Evaluation
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions, often termed slates. The problem is common to recommender systems and user-interface optimization, and it is particularly challenging because of the combinatorially-sized action space. Swaminathan et al. (2017) have proposed the pseudoinverse (PI) estimator under the assumption that the conditional mean rewards are additive in actions. Using control variates, we consider a large class of unbiased estimators that includes as specific cases the PI estimator and (asymptotically) its self-normalized variant. By optimizing over this class, we obtain new estimators with risk improvement guarantees over both the PI and the self-normalized PI estimators.